-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LightDM failsafe handler and restart directives #340
LightDM failsafe handler and restart directives #340
Conversation
Everyone contributing to this PR have now signed the CLA. Thanks! |
I signed this, btw, but maybe my launchpad ID was supposed to be my launchpad account email? |
Can you please resolve the above conflicts? Thanks. |
Sorry, @wallentx needs a rebase again. |
This is great @wallentx I want this too! Its rather maddening when you get caught in a restart loop and as you are at some tty, the restart occurs and in the middle of typing you get flipped back to tty1 as things fail. Having some back off / retry limit would be great ootb config! |
939c34d
to
0c3b360
Compare
Hey! wallentx has not signed the Canonical CLA which is required to get this contribution merged on this project.
Please head over to https://ubuntu.com/legal/contributors to read more about it. |
Hey @robert-ancell I've been held up on attempting to fix my Canonical CLA issue because I found a bug/exploit/bypass with the Canonical cla-check itself, and have a PR that shows it's existence, and also includes the fix canonical/has-signed-canonical-cla#38 If I fix my CLA, then the evidence of the flaw disappears. Do you know anyone who might be able to at least ack the details of my other PR so that I can be freed up to move forward here? |
@wallentx thanks for working on this! I've pinged the person who seems to be maintaining the CLA checker and hopefully they'll land that fix soon :) |
This PR is in reference to #337
In a few different scenarios, I've experienced situations where a child process spawned by LightDM (such as x11) fails to start. As per the instructions in lightdm.service, upon failure, LightDM will
Restart=always
.The problem that can come from this, is when one of those services are failing for a very specific reason that requires the user to make changes. In a scenario where such a service will fail 100% of the time, this will cause LightDM to
Restart=always
100% of the time, when will eventually trigger systemd's rate-limiting mechanism, resulting in logs that would appear like this:The problem is that, depending on the failure, the user might only see a black screen, with no indication that anything is wrong, except that the screen is black, and nothing is loading. The user would only be able to discover details in the aforementioned logs if they were to switch to a different TTY, log in, and check their logs. From here, they would be able to discover why the failure happened, and fix the issue.
The changes that I am proposing are essentially a shortcut to what I just described.
The
lightdm.service
file now includes StartLimit IntervalSec and Burst values, a RestartSec rate, and an OnFailure action. The values I've proposed I think make sense:The service has a restart delay of 5 seconds in between resets, and if the service restarts more than 5 times within a 60s period, then the service will be in a failed state. In the event of it being in a failed state, it triggers the OnFailure action:debian/lightdm-failure-handler.service
This service makes a copy of
/etc/issue
(the MOTD-like message you see above a TTY login) to/etc/issue.restore
, and writes in a new message "hint" to/etc/issue
that lets the user know what to do. It then changes the user toTTY4
(I only chose 4 because I've noticed others to be reserved for other things, or unavailable.. someone should check my reasoning here though) so that they can see that message, and log in to troubleshoot.Upon successful login of LightDM, if an
/etc/issue.restore
file exists, it will replace/etc/issue
(which presumably has the "hint" we wrote earlier). If no .restore file is found, no action is taken.I couldn't find anything else in the repo specifically that referenced the packaging or manifest of the .service files, but maybe that is handled elsewhere.